VARIABLE TYPES AND DATATYPES

Each variable in a ViSta data object has a "type", referred to as the "variable type" of the variable. Each data object also has a "type", called the "datatype". 

Don't confuse these two "types". Variable types are fundamental. They are imutable characteristics of the variables. On the other hand, the datatype is an immutable characteristic of a dataset, which is derived from the variable types and from other information.


VARIABLE TYPES

One of the basic pieces of information that ViSta needs for every aspect of what it does is the "type" of each variable in the data. The "variable type" must be ONLY one of the following three choices:

CATEGORY - The information provided by each observation of a variable only specifies the category of the observation, nothing more... not the order of the observations nor the values of the observations.

ORDINAL  - The information provided by each observation of a variable specifies the order as well as the category of the observation, but not the numeric value.

NUMERIC  - The information provided by each observation of a variable specifies the numeric value as well as the order and category of the observation.

Note that currently ViSta treats ORDINAL variables as though they are NUMERIC (with rare, but obvious, exceptions).


DATATYPES

Each ViSta data object has a datatype. The datatype determines ViSta's default analyses and visualizations, and limits the choice of actions you can take to those that are likely to be reasonable. 

Datatypes are meant to simplify the user's experience... they allow ViSta to make a more educated guess about what should be done with the data, and provide an unobtrusive way of guiding the user by limiting choices.  

The datatype depends on
1) whether the data contain missing values;
2) the mix of variable types; and
3) whether the values for numeric variables represent quantity, frequency or relatedness.
Note that the definition of "frequency" data has been retro-fitted to ViSta and is a bit of a kludge. Specifically data are frequencies if explicitly declared so in the datacode, or if all numeric variables are named "Freq".

There are 11 data types recognized by ViSta. They include:

A) MISSING - The data have one or more NIL elements

B) MATRIX - A datatype where the basic datum is a relation. The observations and variables refer to the same things, the data elements are relational, specifying the relation (correlation, distance, covariance) between pairs of the things.

C) Six datatypes where the basic datum is a quantity: 
   CATEGORY - There are only category variables
   UNIVARIATE - There is one variable and it is non-frequency numeric
   BIVARIATE - There are two variables and they are both non-frequency numeric
   MULTIVARIATE - There are more than two variables. All are non-frequency numeric
   CLASSIFICATION - There is exactly 1 non-frequency numeric variable and there are 1 or more category variables
   GENERAL - When none of the above defintions is satisfied the datatype is "general"

D) Three datatypes where the basic datum is a frequency: 
   FREQUENCY - All the variables are numeric frequency variables
   FREQCLASS - There is exactly 1 numeric frequency variable and there are 1 or more category variables
   CROSSTABS - There are 1 or more numeric frequency variables 1 and or more category variables

Formally, the datatypes are defined to be:

A) MISSING if the data contain one or more NIL elements.
B) MATRIX if matrices are present without NIL elements.
C) If neither of the above is true, then the datatype is defined according to the number of category and numeric variables in the data, and by whether the data are frequencies or not. Specifically:

Number of|     |               Number of      
Category |freq?|           Numeric Variables
Variables|     |   0          1          2          >2
    0    | nil |  error   univariate bivariate multivariate
         |  t  |  error      freq       freq       freq
   >0    | nil | category    class     general    general
         |  t  | category  freqclass  crosstabs  crosstabs
 NOTES:
 a) TABLE datatype is no longer used.
 b) ORDINAL variables are treated as NUMERIC.
 c) Defined on ALL varibles, NOT active variables. This makes the datatype a characteristic of the data, not the active data. I am considering changing this in the future.
